Entropy Controlled Non-Stationarity for Improving Performance of Independent Learners in Anonymous MARL Settings

نویسندگان

  • Tanvi Verma
  • Pradeep Varakantham
  • Hoong Chuin Lau
چکیده

With the advent of sequential matching (of supply and demand) systems (uber, Lyft, Grab for taxis; ubereats, deliveroo, etc for food; amazon prime, lazada etc. for groceries) across many online and offline services, individuals (taxi drivers, delivery boys, delivery van drivers, etc.) earn more by being at the ”right” place at the ”right” time. We focus on learning techniques for providing guidance (on right locations to be at right times) to individuals in the presence of other ”learning” individuals. Interactions between indivduals are anonymous, i.e, the outcome of an interaction (competing for demand) is independent of the identity of the agents and therefore we refer to these as Anonymous MARL settings. Existing research of relevance is on independent learning using Reinforcement Learning (RL) or on Multi-Agent Reinforcement Learning (MARL). The number of individuals in aggregation systems is extremely large and individuals have their own selfish interest (of maximising revenue). Therefore, traditional MARL approaches are either not scalable or assumptions of common objective or action coordination are not viable. In this paper, we focus on improving performance of independent reinforcement learners, specifically the popular Deep Q-Networks (DQN) and Advantage Actor Critic (A2C) approaches by exploiting anonymity. Specifically, we control nonstationarity introduced by other agents using entropy of agent density distribution. We demonstrate a significant improvement in revenue for individuals and for all agents together with our learners on a generic experimental set up for aggregation systems and a real world taxi dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Listening to English Songs on Iranian EFL Pre-intermediate Learners’ Listening Comprehension

The aim of this study was to find out whether listening to English songs can improve pre-intermediate EFL learners’ listening comprehension. To this end, a non-randomized pretest-posttest control group design as one of the quasi-experimental research designs was employed. The sample of the study consisted of 40 male and female English learners from two classes in an Institute in Marand, Iran. T...

متن کامل

The Differing Role of L2 WTC on Iranian EFL Learners’ Performance on a Computerized Dynamic Test of Writing

Computerized dynamic assessment has been proposed as a solution to the practicality issues involved in ordinary dynamic assessment procedures. However, most of computerized dynamic assessment studies have addressed receptive skills (i.e., reading and listening). Responding to the scarcity of CDA studies of productive skills, the present study was an attempt to design and implement a computerize...

متن کامل

A Comparison of Writing Performance on Independent and Integrated Writing Tasks

Researchers and scholars have been attracted by the idea of using integrated writing task along with independent writing task to best assess the EFL learners’ writing competence. This study was conducted to compare the writing performance of EFL students in integrated and independent writing tasks. It also aimed to find out if writing performance varies with task types. A number of thirty Irani...

متن کامل

A Comparison of Writing Performance on Independent and Integrated Writing Tasks

Researchers and scholars have been attracted by the idea of using integrated writing task along with independent writing task to best assess the EFL learners’ writing competence. This study was conducted to compare the writing performance of EFL students in integrated and independent writing tasks. It also aimed to find out if writing performance varies with task types. A number of thirty Irani...

متن کامل

The Effect of Natural and Classroom Settings on Iranian Learners’ Pragmatic Competence: Length of Residence Versus Intervention

The present investigation sought to comparatively explore the effect of length of residence (LOR) as well as EFL instruction on the pragmatic competence of Iranian learners. The participants were 45 EFL learners in Iran and 45 learners in Canada selected based on snowball sampling and each group was divided into three sub-groups based on the years of learning English in an EFL or a natural sett...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018